AITopics | acoustic signal

Collaborating Authors

acoustic signal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

Zhang, Xijie, He, Fengliang, Dai, Hong-Ning

arXiv.org Artificial IntelligenceNov-11-2025

Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs' strengths, it is non-trivial to achieve few-shot and zero-shot learning of CIR gestures due to their inconspicuous features. To tackle this challenge, we collect differential CIR rather than original CIR data. Moreover, we construct a real-world dataset collected from 10 participants performing 15 gestures across three categories (digits, letters, and shapes), with 10 repetitions each. We then conduct extensive experiments on this dataset using an LLM-adopted classifier. Results show that our LLM-based framework achieves accuracy comparable to classical machine learning baselines, while requiring no domain-specific retraining.

large language model, machine learning, recognition, (19 more...)

arXiv.org Artificial Intelligence

2511.07085

Country:

North America (0.28)
Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WaveVerif: Acoustic Side-Channel based Verification of Robotic Workflows

Erdogan, Zeynep Yasemin, Nagaraja, Shishir, Ahmed, Chuadhry Mujeeb, Shah, Ryan

arXiv.org Artificial IntelligenceOct-31-2025

In this paper, we present a framework that uses acoustic side-channel analysis (ASCA) to monitor and verify whether a robot correctly executes its intended commands. We develop and evaluate a machine-learning-based workflow verification system that uses acoustic emissions generated by robotic movements. The system can determine whether real-time behavior is consistent with expected commands. The evaluation takes into account movement speed, direction, and microphone distance. The results show that individual robot movements can be validated with over 80% accuracy under baseline conditions using four different classifiers: Support Vector Machine (SVM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Convolutional Neural Network (CNN). Additionally, workflows such as pick-and-place and packing could be identified with similarly high confidence. Our findings demonstrate that acoustic signals can support real-time, low-cost, passive verification in sensitive robotic environments without requiring hardware modifications.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.2596

Country: North America > United States (0.30)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment

Hoang, Long-Vu, Nguyen, Tuan, Dat, Tran Huy

arXiv.org Artificial IntelligenceJun-18-2025

This paper presents a novel non-invasive object classification approach using acoustic scattering, demonstrated through a case study on hair assessment. When an incident wave interacts with an object, it generates a scattered acoustic field encoding structural and material properties. By emitting acoustic stimuli and capturing the scattered signals from head-with-hair-sample objects, we classify hair type and moisture using AI-driven, deep-learning-based sound classification. We benchmark comprehensive methods, including (i) fully supervised deep learning, (ii) embedding-based classification, (iii) supervised foundation model fine-tuning, and (iv) self-supervised model fine-tuning. Our best strategy achieves nearly 90% classification accuracy by fine-tuning all parameters of a self-supervised model. These results highlight acoustic scattering as a privacy-preserving, non-contact alternative to visual classification, opening huge potential for applications in various industries.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.14148

Country: Asia (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis

Panteli, Eirini, Santos, Paulo E., Humphrey, Nabil

arXiv.org Artificial IntelligenceMay-21-2025

This paper presents AquaSignal, a modular and scalable pipeline for preprocessing, denoising, classification, and novelty detection of underwater acoustic signals. Designed to operate effectively in noisy and dynamic marine environments, AquaSignal integrates state-of-the-art deep learning architectures to enhance the reliability and accuracy of acoustic signal analysis. The system is evaluated on a combined dataset from the Deepship and Ocean Networks Canada (ONC) benchmarks, providing a diverse set of real-world underwater scenarios. AquaSignal employs a U-Net architecture for denoising, a ResNet18 convolutional neural network for classifying known acoustic events, and an AutoEncoder-based model for unsupervised detection of novel or anomalous signals. To our knowledge, this is the first comprehensive study to apply and evaluate this combination of techniques on maritime vessel acoustic data. Experimental results show that AquaSignal improves signal clarity and task performance, achieving 71% classification accuracy and 91% accuracy in novelty detection. Despite slightly lower classification performance compared to some state-of-the-art models, differences in data partitioning strategies limit direct comparisons. Overall, AquaSignal demonstrates strong potential for real-time underwater acoustic monitoring in scientific, environmental, and maritime domains.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.14285

Country: North America > Canada (0.25)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation > Marine (1.00)
Energy (1.00)
Government > Military (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A multi-head deep fusion model for recognition of cattle foraging events using sound and movement signals

Ferrero, Mariano, Chelotti, José Omar, Martinez-Rau, Luciano Sebastián, Vignolo, Leandro, Pires, Martín, Galli, Julio Ricardo, Giovanini, Leonardo Luis, Rufiner, Hugo Leonardo

arXiv.org Artificial IntelligenceMay-16-2025

Monitoring feeding behaviour is a relevant task for efficient herd management and the effective use of available resources in grazing cattle. The ability to automatically recognise animals' feeding activities through the identification of specific jaw movements allows for the improvement of diet formulation, as well as early detection of metabolic problems and symptoms of animal discomfort, among other benefits. The use of sensors to obtain signals for such monitoring has become popular in the last two decades. The most frequently employed sensors include accelerometers, microphones, and cameras, each with its own set of advantages and drawbacks. An unexplored aspect is the simultaneous use of multiple sensors with the aim of combining signals in order to enhance the precision of the estimations. In this direction, this work introduces a deep neural network based on the fusion of acoustic and inertial signals, composed of convolutional, recurrent, and dense layers. The main advantage of this model is the combination of signals through the automatic extraction of features independently from each of them. The model has emerged from an exploration and comparison of different neural network architectures proposed in this work, which carry out information fusion at different levels. Feature-level fusion has outperformed data and decision-level fusion by at least a 0.14 based on the F1-score metric. Moreover, a comparison with state-of-the-art machine learning methods is presented, including traditional and deep learning approaches. The proposed model yielded an F1-score value of 0.802, representing a 14% increase compared to previous methods. Finally, results from an ablation study and post-training quantization evaluation are also reported.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.10198

Country:

South America > Argentina (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Sweden (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Indoor Drone Localization and Tracking Based on Acoustic Inertial Measurement

Sun, Yimiao, Wang, Weiguo, Mottola, Luca, Jia, Zhang, Wang, Ruijin, He, Yuan

arXiv.org Artificial IntelligenceApr-1-2025

We present Acoustic Inertial Measurement (AIM), a one-of-a-kind technique for indoor drone localization and tracking. Indoor drone localization and tracking are arguably a crucial, yet unsolved challenge: in GPS-denied environments, existing approaches enjoy limited applicability, especially in Non-Line of Sight (NLoS), require extensive environment instrumentation, or demand considerable hardware/software changes on drones. In contrast, AIM exploits the acoustic characteristics of the drones to estimate their location and derive their motion, even in NLoS settings. We tame location estimation errors using a dedicated Kalman filter and the Interquartile Range rule (IQR) and demonstrate that AIM can support indoor spaces with arbitrary ranges and layouts. We implement AIM using an off-the-shelf microphone array and evaluate its performance with a commercial drone under varied settings. Results indicate that the mean localization error of AIM is 46% lower than that of commercial UWB-based systems in a complex 10m\times10m indoor scenario, where state-of-the-art infrared systems would not even work because of NLoS situations. When distributed microphone arrays are deployed, the mean error can be reduced to less than 0.5m in a 20m range, and even support spaces with arbitrary ranges and layouts.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.00445

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Robotics & Automation (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds

Shibata, Yuto, Oumi, Yusuke, Irie, Go, Kimura, Akisato, Aoki, Yoshimitsu, Isogawa, Mariko

arXiv.org Artificial IntelligenceMar-1-2025

We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents significant challenges. In contrast to sound sources specifically designed for measurement, regular music varies in both volume and pitch. These dynamic changes in signals caused by music are inevitably mixed with alterations in the sound field resulting from human motion, making it hard to extract reliable cues for pose estimation. To address these challenges, BGM2Pose introduces a Contrastive Pose Extraction Module that employs contrastive learning and hard negative sampling to eliminate musical components from the recorded data, isolating the pose information. Additionally, we propose a Frequency-wise Attention Module that enables the model to focus on subtle acoustic variations attributable to human movement by dynamically computing attention across frequency bands. Experiments suggest that our method outperforms the existing methods, demonstrating substantial potential for real-world applications. Our datasets and code will be made publicly available.

computer vision, music, pose estimation, (12 more...)

arXiv.org Artificial Intelligence

2503.00389

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > Experimental Study (0.48)

Industry:

Information Technology (0.68)
Leisure & Entertainment (0.66)
Media > Music (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.94)
(2 more...)

Add feedback

Software implemented fault diagnosis of natural gas pumping unit based on feedforward neural network

Kozlenko, Mykola, Zamikhovska, Olena, Zamikhovskyi, Leonid

arXiv.org Artificial IntelligenceFeb-25-2025

In recent years, more and more attention has been paid to the use of artificial neural networks (ANN) for diagnostics of gas pumping units (GPU). Usually, ANN training is carried out on models of GPU workflows, and generated sets of diagnostic data are used to simulate defect conditions. At the same time, the results obtained do not allow assessing the real state of the GPU. It is proposed to use the values of the characteristics of the acoustic and vibration processes of the GPU as the input data of the ANN. A descriptive statistical analysis of real vibration and acoustic processes generated by the operation of the GPU type GTK-25-i (Nuovo Pignone, Italy) has been carried out. The formation of packets of diagnostic signs arriving at the input of the ANN has been carried out. The diagnostic features are the five maximum amplitude components of the acoustic and vibration signals, as well as the value of the standard deviation for each sample. Diagnostic signs are calculated directly in the input pipeline of ANN data in real time for three technical states of the GPU. Using the frameworks TensorFlow, Keras, NumPy, pandas, in the Python 3 programming language, an architecture was developed for a deep fully connected feedforward ANN, training on the error backpropagation algorithm. The results of training and testing of the developed ANN are presented. During testing, it was found that the signal classification precision for the "nominal" state of all 1475 signal samples is 1.0000, for the "current" state, precision equils 0.9853, and for the "defective" state, precision is 0.9091. The use of the developed ANN makes it possible to classify the technical states of the GPU with an accuracy sufficient for practical use, which will prevent the occurrence of GPU failures. ANN can be used to diagnose GPU of any type and power.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.15587/1729-4061.2021.229859

2502.18233

Country:

Europe > Italy (0.24)
Europe > Ukraine (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

SonicBoom: Contact Localization Using Array of Microphones

Lee, Moonyoung, Yoo, Uksang, Oh, Jean, Ichnowski, Jeffrey, Kantor, George, Kroemer, Oliver

arXiv.org Artificial IntelligenceDec-13-2024

In cluttered environments where visual sensors encounter heavy occlusion, such as in agricultural settings, tactile signals can provide crucial spatial information for the robot to locate rigid objects and maneuver around them. We introduce SonicBoom, a holistic hardware and learning pipeline that enables contact localization through an array of contact microphones. While conventional sound source localization methods effectively triangulate sources in air, localization through solid media with irregular geometry and structure presents challenges that are difficult to model analytically. We address this challenge through a feature engineering and learning based approach, autonomously collecting 18,000 robot interaction sound pairs to learn a mapping between acoustic signals and collision locations on the robot end effector link. By leveraging relative features between microphones, SonicBoom achieves localization errors of 0.42cm for in distribution interactions and maintains robust performance of 2.22cm error even with novel objects and contact conditions. We demonstrate the system's practical utility through haptic mapping of occluded branches in mock canopy settings, showing that acoustic based sensing can enable reliable robot navigation in visually challenging environments.

artificial intelligence, localization, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.09878

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Acoustic-based 3D Human Pose Estimation Robust to Human Position

Oumi, Yusuke, Shibata, Yuto, Irie, Go, Kimura, Akisato, Aoki, Yoshimitsu, Isogawa, Mariko

arXiv.org Artificial IntelligenceNov-8-2024

This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to sound obstruction, the existing model degrades its accuracy significantly when subjects deviate from this line, limiting its practicality in real-world scenarios. To overcome this limitation, we propose a novel method composed of a position discriminator and reverberation-resistant model. The former predicts the standing positions of subjects and applies adversarial learning to extract subject position-invariant features. The latter utilizes acoustic signals before the estimation target time as references to enhance robustness against the variations in sound arrival times due to diffraction and reflection. We construct an acoustic pose estimation dataset that covers diverse human locations and demonstrate through experiments that our proposed method outperforms existing approaches.

artificial intelligence, estimation, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2411.07165

Genre: Research Report > Promising Solution (0.48)

Industry: Energy > Oil & Gas > Upstream (0.56)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback